Lifei/agent browser by lifeizhou-ap · Pull Request #8032 · block/goose

lifeizhou-ap · 2026-03-20T12:35:38Z

Summary

Testing

Related Issues

Relates to #ISSUE_ID
Discussion: LINK (if any)

Screenshots/Demos (for UX changes)

Before:

After:

* main: (22 commits) feat: add gemini-acp provider, update docs on subscription models + improvements to codex (#8000) fix(openai): use Responses API for gpt-5.4 (#7982) Remove lead/worker provider (#7989) chore(release): release version 1.28.0 (#7991) Fix empty tool results from resource content (e.g. auto visualiser) (#7866) Separate SSE streaming from POST work submission (#7834) fix: include token usage in Databricks streaming responses (#7959) Optimize tool summarization (#7938) fix: overwrite the deprecated googledrive extension config (#7974) refactor: remove unnecessary Arc<Mutex> from tool execution pipeline (#7979) Revert message flush & test (#7966) docs: add Remote Access section with Telegram Gateway documentation (#7955) fix: update webmcp blog post metadata image URL (#7967) fix: clean up OAuth token cache on provider deletion (#7908) fix: hard-coded tool call id in code mode callback (#7939) Fix SSE parsers to accept optional space after data: prefix (#7929) docs: add GOOSE_INPUT_LIMIT to config-files.md (#7961) Add WebMCP for Beginners blog post (#7957) Fix download manager (#7933) Improve the formatting of tool calls, show thinking, treat Reasoning and Thinking as the same thing (sorry Kant) (#7626) ...

.github/workflows/pr-smoke-test.yml

* main: (37 commits) fix: handle reasoning content blocks in OpenAI-compat streaming parser (#8078) chore(acp): build native packages on latest mac (#8075) Display delegate sub agents logs in UI (#7519) Update tar version to avoid CVE-2026-33056 (#8073) refactor: consolidate duplicated dependencies into workspace (#8041) tui: set up for publishing via github actions (#8020) feat: feature-gate local inference dependencies (#7976) feat: ability to manage sub recipes in desktop ui (#6360) Tweak the release process: no more merge to main (#7994) fix: gemini models via databricks (#8042) feat(apps): Pass toolInfo to MCP Apps via hostContext (#7506) fix: remove configured marker when deleting oauth provider configuration (#7887) docs: add vmware-aiops MCP extension documentation (#8055) Show setup instructions for ACP providers in settings modal (#8065) deps: replace sigstore-verification with sigstore-verify to kill vulns (#8064) feat(acp): add session/set_config and stabilize list, delete and close (#7984) docs: Correct `gosoe` typo to `goose` (#8062) fix: use default provider and model when provider in session no longer exists (#8035) feat: add GOOSE_SHELL env var to configure preferred shell (#7909) fix(desktop): fullscreen header bar + always-visible close controls (#8033) ...

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b1fc2bf84b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-24T01:41:30Z

ui/desktop/tests/e2e-tests/scripts/replay.sh

+
+  STEP=$((i + 1))
+  echo "[$STEP/$TOTAL] agent-browser ${GLOBAL_ARGS[*]} ${ARGS[*]} --timeout $CMD_TIMEOUT_MS"
+  if ! timeout "$CMD_TIMEOUT" pnpm exec agent-browser "${GLOBAL_ARGS[@]}" "${ARGS[@]}"; then


Respect command-specific timeouts in replay

replay.sh hard-caps every agent-browser command to 10 seconds via timeout "$CMD_TIMEOUT" ..., so recorded steps that intentionally request longer waits (for example wait --timeout 60000/120000 in the new extension recordings) will still be killed at 10s. This makes long-running LLM/tool steps fail even when the recording specifies a larger timeout, causing deterministic false failures in the new E2E workflow.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-24T01:41:30Z

ui/desktop/src/main.ts

+  // In e2e test mode, auto-confirm dialogs so CDP-based tests can proceed
+  // without needing to interact with native OS dialogs.
+  if (process.env.ENABLE_PLAYWRIGHT) {
+    return { response: 1 };


Return the actual affirmative dialog response in E2E mode

In ENABLE_PLAYWRIGHT mode this handler always returns { response: 1 }, but several callers treat index 0 as the confirm action (e.g., KeyboardShortcutsSection conflict/reset dialogs). That means these dialogs are auto-cancelled instead of auto-confirmed during E2E runs, so test behavior diverges from user behavior and some flows become untestable.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3bef76bc45

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-24T02:42:29Z

ui/desktop/tests/e2e-tests/scripts/e2e-start.sh

+    PORT=$((9300 + RANDOM % 100))
+    if ! lsof -i :"$PORT" &>/dev/null; then
+      echo "$PORT"


Reserve CDP ports atomically across parallel workers

pick_port does a check-then-use (lsof then echo) on a shared 9300–9399 range, but e2e-run-all.sh launches multiple tests concurrently (xargs -P "$WORKERS"). Two workers can select the same still-unbound port, which makes one session connect to the wrong Electron instance or fail nondeterministically once one process binds first. This creates cross-test contamination and flaky CI outcomes in the new parallel runner.

Useful? React with 👍 / 👎.

addressed this

chatgpt-codex-connector · 2026-03-24T02:42:29Z

ui/desktop/tests/e2e-tests/scripts/e2e-run-all.sh

+  if ! CDP_PORT=$(wait_for_app "$TEST_NAME"); then
+    local DURATION=$(( SECONDS - START_TIME ))
+    echo "[$TEST_NAME] FAIL — app did not start within 30s (${DURATION}s)"
+    echo "FAIL ${DURATION}s" > "$STATUS_DIR/$TEST_NAME"
+    return


Always stop the test session on startup timeout

When app startup does not become ready within 30s, run_one returns early before reaching the teardown call (e2e-stop.sh). Since the app was launched in a detached screen session, this path can leave orphaned Electron/goosed processes and stale session artifacts, which then interfere with subsequent recordings in the same run (e.g., port conflicts and residual state).

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2dcb343a45

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-24T05:32:27Z

ui/desktop/tests/e2e-tests/scripts/e2e-stop.sh

+  if [[ -n "$CDP_PORT" ]]; then
+    lsof -ti :"$CDP_PORT" 2>/dev/null | xargs kill -9 2>/dev/null || true
+    pkill -9 -f "remote-debugging-port=$CDP_PORT" 2>/dev/null || true


Stop goosed cleanly instead of SIGKILLing Electron

This teardown path force-kills the CDP listener with kill -9, which bypasses Electron shutdown hooks that normally call goosedResult.cleanup() in main.ts. Because goosed is spawned as goosed agent (no session path in argv), the later pattern kill in e2e-stop.sh does not reliably target it, so tests can leave orphan backend processes running and contaminate subsequent runs.

Useful? React with 👍 / 👎.

kill -9 is intentional to avoid hanging during cleanup. Orphan goosed processes are swept by pkill -9 -f "$BASE_DIR" at the end of e2e-stop.sh. We haven't seen orphan contamination in practice. Will revisit if it becomes an issue.

chatgpt-codex-connector · 2026-03-24T05:32:27Z

ui/desktop/tests/e2e-tests/scripts/e2e-run-all.sh

+trap 'echo ""; echo "Interrupted, stopping active E2E sessions..."; cleanup_and_exit 130' INT TERM
+
+# Run recordings in parallel with worker limit
+printf '%s\n' "${RECORDINGS[@]}" | xargs -P "$WORKERS" -I {} bash -c "run_one '{}' '$STATUS_DIR'" || true


Gate parallel workers when single-instance lock applies

The runner always executes recordings in parallel, but the app still enforces app.requestSingleInstanceLock() on non-macOS in main.ts. On Linux/Windows, only one Electron instance can remain alive, so additional workers will fail startup and be reported as test failures; this needs a platform guard (or forcing --workers 1) to avoid deterministic false negatives outside macOS.

Useful? React with 👍 / 👎.

we only supports macos at the moment

chatgpt-codex-connector · 2026-03-24T05:32:27Z

ui/desktop/tests/e2e-tests/scripts/e2e-run-all.sh

+  timeout "$TIMEOUT" bash "$SCRIPT_DIR/replay.sh" \
+    "$RECORDING" \


Respect scenario time budgets before test-level timeout

Each recording is wrapped in a fixed timeout "$TIMEOUT" (default 120s), but new recordings include waits up to 120s after earlier long waits (for example, extension-install-playwright.batch.json has 60s + 60s + 120s waits). This causes the harness to terminate otherwise valid long-running scenarios before their own declared timeouts elapse, producing avoidable TIMEOUT failures.

Useful? React with 👍 / 👎.

reduced timeout in the test

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8281334588

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-24T06:37:53Z

ui/desktop/src/main.ts

  const goosedResult = await startGoosed({
    serverSecret,
-    dir: dir || os.homedir(),
+    dir: process.env.GOOSE_WORKING_DIR || dir || os.homedir(),


Preserve requested chat directory over env default

createChat now always prefers process.env.GOOSE_WORKING_DIR over the per-window dir argument, so any flow that passes a specific directory (e.g., deep links/new-window creation with dir) is silently ignored whenever that env var is set. This regresses directory-scoped behavior in those launches: users/tests can request a different folder, but startGoosed still starts in the env folder. Restricting the env override to initial defaulting (or to explicit E2E mode) avoids breaking dir-driven window creation.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0430864b99

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-24T11:41:45Z

ui/desktop/tests/e2e-tests/scripts/e2e-stop.sh

+
+pkill -9 -f "$BASE_DIR" 2>/dev/null || true
+pkill -9 -f 'agent-browser-chrome' 2>/dev/null || true
+pkill -9 -f 'agent-browser-darwin\|agent-browser-linux' 2>/dev/null || true


Use ERE alternation when killing agent-browser helpers

pkill patterns are interpreted as regular expressions (BSD/FreeBSD pgrep/pkill docs describe these as extended regex), so alternation should be |, not \|. With the current pattern, this command matches a literal pipe and does not match agent-browser-darwin or agent-browser-linux, leaving helper processes alive after teardown; those leftovers can interfere with later E2E runs on the same machine.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c0c9c54d77

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-24T11:56:18Z

ui/desktop/tests/e2e-tests/scripts/e2e-stop.sh

+    rm -f "$BASE_DIR/.port-locks/$CDP_PORT"
+  fi
+
+  pkill -9 -f "$SESSION_DIR" 2>/dev/null || true


Kill goosed explicitly for per-session teardown

This teardown path does not reliably terminate the backend process after force-killing Electron. stop_session only pattern-kills by "$SESSION_DIR", but startGoosed launches goosed as goosed agent with the session path only in cwd (not argv), so that match can miss; and in the normal runner path we call e2e-stop.sh "$TEST_NAME", which exits before the global fallback sweeps at the bottom of the script. Fresh evidence: ui/desktop/src/goosed.ts uses spawnArgs = ['agent'] and cwd: workingDir, so argv-based pkill -f "$SESSION_DIR" is not a reliable cleanup mechanism. When this miss happens, orphan goosed processes can leak into later tests and cause cross-test contamination/flakes.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 278f91582c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-24T14:09:47Z

.github/workflows/pr-smoke-test.yml

+        uses: Swatinem/rust-cache@42dc69e1aa15d09112580998cf2ef0119e2e91ae  # v2
+
+      - name: Install GNU timeout (if missing)
+        run: command -v timeout || brew install coreutils


Make timeout available on macOS before running E2E

On macos-latest, this step only runs brew install coreutils when timeout is missing, but the replay harness later calls timeout directly for every command. Homebrew coreutils installs gtimeout unless the gnubin path is added, so in the common case where timeout is absent the E2E run can still fail with timeout: command not found before tests execute. Please either export the coreutils gnubin directory (or symlink) here, or update the scripts to fall back to gtimeout.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-24T14:09:47Z

workflow_recipes/debug_e2e_failures/recipe.yaml

+    gh run download {run_id} --repo block/goose --name e2e-test-results --dir /tmp/e2e-results 2>&1
+    # example: result_url: https://github.com/block/goose/actions/runs/23487840554/artifacts/6079302235
+    # gh run download 23487840554 --repo block/goose --name e2e-test-results --dir /tmp/e2e-results 2>&1
+    rm -rf ui/desktop/tests/e2e-tests/results-ci
+    unzip -o /tmp/e2e-results.zip -d ui/desktop/tests/e2e-tests/results-ci


Drop nonexistent unzip step after gh artifact download

When results_url is provided, these instructions call gh run download ... --dir /tmp/e2e-results and then immediately unzip /tmp/e2e-results.zip. gh run download already extracts artifacts into the target directory, so this zip file is not produced; the extra unzip command will fail and block Step 1 for CI-artifact debugging runs.

Useful? React with 👍 / 👎.

lifeizhou-ap added 12 commits March 18, 2026 12:47

install agent browser

194945b

instructions to write e2e test

b127a87

first test

f850b18

e2e test setup script

472ccf5

add skills and generated e2e test

5144227

scripts to run all recorded tests

f9f2259

adjust the skills

bdb89fa

reorganised folder

300eece

custom extension test

26db973

adding more tests

f90c636

enable running e2e test in ci

aa0822e

github-advanced-security bot found potential problems Mar 23, 2026

View reviewed changes

.github/workflows/pr-smoke-test.yml Fixed Show fixed Hide fixed

.github/workflows/pr-smoke-test.yml Fixed Show fixed Hide fixed

lifeizhou-ap added 7 commits March 24, 2026 10:37

resolve pnpm install

99ed02f

updated tests and scripts

1a801c8

added action sha

ec54a48

fixed the test and setup

22c4dd1

install timeout

3281469

update justfile

b1fc2bf

lifeizhou-ap marked this pull request as ready for review March 24, 2026 01:36